Overview

Dataset statistics

Number of variables16
Number of observations1738
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory107.1 KiB
Average record size in memory63.1 B

Variable types

Numeric8
Categorical8

Alerts

country has constant value "0" Constant
df_index is highly correlated with id and 1 other fieldsHigh correlation
id is highly correlated with df_index and 1 other fieldsHigh correlation
state is highly correlated with df_index and 1 other fieldsHigh correlation
pollutant_min is highly correlated with pollutant_max and 1 other fieldsHigh correlation
pollutant_max is highly correlated with pollutant_min and 2 other fieldsHigh correlation
pollutant_avg is highly correlated with pollutant_min and 2 other fieldsHigh correlation
pollutant_id_NH3 is highly correlated with pollutant_max and 1 other fieldsHigh correlation
df_index is highly correlated with id and 1 other fieldsHigh correlation
id is highly correlated with df_index and 1 other fieldsHigh correlation
state is highly correlated with df_index and 1 other fieldsHigh correlation
pollutant_min is highly correlated with pollutant_max and 2 other fieldsHigh correlation
pollutant_max is highly correlated with pollutant_min and 1 other fieldsHigh correlation
pollutant_avg is highly correlated with pollutant_min and 1 other fieldsHigh correlation
pollutant_id_PM10 is highly correlated with pollutant_minHigh correlation
df_index is highly correlated with id and 1 other fieldsHigh correlation
id is highly correlated with df_index and 1 other fieldsHigh correlation
state is highly correlated with df_index and 1 other fieldsHigh correlation
pollutant_min is highly correlated with pollutant_max and 1 other fieldsHigh correlation
pollutant_max is highly correlated with pollutant_min and 1 other fieldsHigh correlation
pollutant_avg is highly correlated with pollutant_min and 1 other fieldsHigh correlation
country is highly correlated with pollutant_id_SO2 and 6 other fieldsHigh correlation
pollutant_id_SO2 is highly correlated with countryHigh correlation
pollutant_id_NH3 is highly correlated with countryHigh correlation
pollutant_id_NO2 is highly correlated with countryHigh correlation
pollutant_id_PM2.5 is highly correlated with countryHigh correlation
pollutant_id_CO is highly correlated with countryHigh correlation
pollutant_id_OZONE is highly correlated with countryHigh correlation
pollutant_id_PM10 is highly correlated with countryHigh correlation
df_index is highly correlated with id and 3 other fieldsHigh correlation
id is highly correlated with df_index and 3 other fieldsHigh correlation
state is highly correlated with df_index and 3 other fieldsHigh correlation
city is highly correlated with df_index and 3 other fieldsHigh correlation
station is highly correlated with df_index and 3 other fieldsHigh correlation
pollutant_min is highly correlated with pollutant_max and 3 other fieldsHigh correlation
pollutant_max is highly correlated with pollutant_min and 4 other fieldsHigh correlation
pollutant_avg is highly correlated with pollutant_min and 3 other fieldsHigh correlation
pollutant_id_NH3 is highly correlated with pollutant_maxHigh correlation
pollutant_id_PM10 is highly correlated with pollutant_min and 2 other fieldsHigh correlation
pollutant_id_PM2.5 is highly correlated with pollutant_min and 2 other fieldsHigh correlation
df_index has unique values Unique
id has unique values Unique
state has 28 (1.6%) zeros Zeros

Reproduction

Analysis started2021-10-31 11:23:42.428467
Analysis finished2021-10-31 11:24:03.408325
Duration20.98 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIQUE

Distinct1738
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean906.6317606
Minimum0
Maximum1835
Zeros1
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size13.7 KiB
2021-10-31T16:54:03.630854image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile86.85
Q1445.25
median904.5
Q31366.75
95-th percentile1728.15
Maximum1835
Range1835
Interquartile range (IQR)921.5

Descriptive statistics

Standard deviation530.7635132
Coefficient of variation (CV)0.585423472
Kurtosis-1.218775098
Mean906.6317606
Median Absolute Deviation (MAD)461
Skewness0.01509757116
Sum1575726
Variance281709.9069
MonotonicityStrictly increasing
2021-10-31T16:54:04.057886image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
0.1%
12151
 
0.1%
12261
 
0.1%
12251
 
0.1%
12241
 
0.1%
12231
 
0.1%
12221
 
0.1%
12211
 
0.1%
12201
 
0.1%
12191
 
0.1%
Other values (1728)1728
99.4%
ValueCountFrequency (%)
01
0.1%
11
0.1%
21
0.1%
31
0.1%
41
0.1%
51
0.1%
61
0.1%
71
0.1%
81
0.1%
91
0.1%
ValueCountFrequency (%)
18351
0.1%
18341
0.1%
18331
0.1%
18321
0.1%
18311
0.1%
18301
0.1%
18291
0.1%
18281
0.1%
18271
0.1%
18261
0.1%

id
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIQUE

Distinct1738
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean907.6317606
Minimum1
Maximum1836
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size13.7 KiB
2021-10-31T16:54:04.296903image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile87.85
Q1446.25
median905.5
Q31367.75
95-th percentile1729.15
Maximum1836
Range1835
Interquartile range (IQR)921.5

Descriptive statistics

Standard deviation530.7635132
Coefficient of variation (CV)0.584778471
Kurtosis-1.218775098
Mean907.6317606
Median Absolute Deviation (MAD)461
Skewness0.01509757116
Sum1577464
Variance281709.9069
MonotonicityStrictly increasing
2021-10-31T16:54:04.525923image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
11
 
0.1%
12161
 
0.1%
12271
 
0.1%
12261
 
0.1%
12251
 
0.1%
12241
 
0.1%
12231
 
0.1%
12221
 
0.1%
12211
 
0.1%
12201
 
0.1%
Other values (1728)1728
99.4%
ValueCountFrequency (%)
11
0.1%
21
0.1%
31
0.1%
41
0.1%
51
0.1%
61
0.1%
71
0.1%
81
0.1%
91
0.1%
101
0.1%
ValueCountFrequency (%)
18361
0.1%
18351
0.1%
18341
0.1%
18331
0.1%
18321
0.1%
18311
0.1%
18301
0.1%
18291
0.1%
18281
0.1%
18271
0.1%

country
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size98.6 KiB
0
1738 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
01738
100.0%

Length

2021-10-31T16:54:04.791943image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-31T16:54:04.903951image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
01738
100.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

state
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct26
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12.87802071
Minimum0
Maximum25
Zeros28
Zeros (%)1.6%
Negative0
Negative (%)0.0%
Memory size6.9 KiB
2021-10-31T16:54:05.009956image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q16
median11
Q321
95-th percentile24
Maximum25
Range25
Interquartile range (IQR)15

Descriptive statistics

Standard deviation7.639101168
Coefficient of variation (CV)0.5931890729
Kurtosis-1.292040648
Mean12.87802071
Median Absolute Deviation (MAD)6
Skewness0.3092575676
Sum22382
Variance58.35586666
MonotonicityIncreasing
2021-10-31T16:54:05.206974image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=26)
ValueCountFrequency (%)
24287
16.5%
5242
13.9%
7195
11.2%
13188
10.8%
10183
10.5%
1282
 
4.7%
680
 
4.6%
2569
 
4.0%
266
 
3.8%
2066
 
3.8%
Other values (16)280
16.1%
ValueCountFrequency (%)
028
 
1.6%
113
 
0.7%
266
 
3.8%
314
 
0.8%
45
 
0.3%
5242
13.9%
680
 
4.6%
7195
11.2%
84
 
0.2%
91
 
0.1%
ValueCountFrequency (%)
2569
 
4.0%
24287
16.5%
232
 
0.1%
2235
 
2.0%
2152
 
3.0%
2066
 
3.8%
1951
 
2.9%
187
 
0.4%
1713
 
0.7%
167
 
0.4%

city
Real number (ℝ≥0)

HIGH CORRELATION

Distinct142
Distinct (%)8.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean63.64729574
Minimum0
Maximum141
Zeros2
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size6.9 KiB
2021-10-31T16:54:05.414989image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile5
Q133
median58
Q393
95-th percentile131.15
Maximum141
Range141
Interquartile range (IQR)60

Descriptive statistics

Standard deviation38.62069801
Coefficient of variation (CV)0.6067924419
Kurtosis-1.09428083
Mean63.64729574
Median Absolute Deviation (MAD)30.5
Skewness0.2378790807
Sum110619
Variance1491.558315
MonotonicityNot monotonic
2021-10-31T16:54:05.638005image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
33242
 
13.9%
93112
 
6.4%
1756
 
3.2%
2842
 
2.4%
8341
 
2.4%
7737
 
2.1%
10735
 
2.0%
5835
 
2.0%
235
 
2.0%
134
 
2.0%
Other values (132)1069
61.5%
ValueCountFrequency (%)
02
 
0.1%
134
2.0%
235
2.0%
33
 
0.2%
47
 
0.4%
57
 
0.4%
67
 
0.4%
77
 
0.4%
87
 
0.4%
97
 
0.4%
ValueCountFrequency (%)
1417
 
0.4%
1407
 
0.4%
1397
 
0.4%
1387
 
0.4%
1377
 
0.4%
1367
 
0.4%
13527
1.6%
1345
 
0.3%
1337
 
0.4%
1326
 
0.3%

station
Real number (ℝ≥0)

HIGH CORRELATION

Distinct280
Distinct (%)16.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean139.9131185
Minimum0
Maximum280
Zeros7
Zeros (%)0.4%
Negative0
Negative (%)0.0%
Memory size6.9 KiB
2021-10-31T16:54:05.922025image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile14
Q170
median140.5
Q3210
95-th percentile267
Maximum280
Range280
Interquartile range (IQR)140

Descriptive statistics

Standard deviation80.72199348
Coefficient of variation (CV)0.5769437085
Kurtosis-1.187803526
Mean139.9131185
Median Absolute Deviation (MAD)69.5
Skewness0.005775083739
Sum243169
Variance6516.040231
MonotonicityNot monotonic
2021-10-31T16:54:06.140043image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2157
 
0.4%
1317
 
0.4%
197
 
0.4%
367
 
0.4%
397
 
0.4%
537
 
0.4%
1097
 
0.4%
1137
 
0.4%
1407
 
0.4%
1477
 
0.4%
Other values (270)1668
96.0%
ValueCountFrequency (%)
07
0.4%
17
0.4%
26
0.3%
37
0.4%
45
0.3%
57
0.4%
65
0.3%
76
0.3%
83
0.2%
97
0.4%
ValueCountFrequency (%)
2807
0.4%
2797
0.4%
2786
0.3%
2776
0.3%
2766
0.3%
2756
0.3%
2747
0.4%
2736
0.3%
2722
 
0.1%
2717
0.4%

pollutant_min
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct149
Distinct (%)8.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28.41426928
Minimum1
Maximum217
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size13.7 KiB
2021-10-31T16:54:06.373061image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q15
median14
Q339
95-th percentile107.15
Maximum217
Range216
Interquartile range (IQR)34

Descriptive statistics

Standard deviation34.40381054
Coefficient of variation (CV)1.210793429
Kurtosis3.413806682
Mean28.41426928
Median Absolute Deviation (MAD)12
Skewness1.851160344
Sum49384
Variance1183.62218
MonotonicityNot monotonic
2021-10-31T16:54:06.588077image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1148
 
8.5%
2116
 
6.7%
486
 
4.9%
382
 
4.7%
570
 
4.0%
667
 
3.9%
856
 
3.2%
1044
 
2.5%
742
 
2.4%
935
 
2.0%
Other values (139)992
57.1%
ValueCountFrequency (%)
1148
8.5%
2116
6.7%
382
4.7%
486
4.9%
570
4.0%
667
3.9%
742
 
2.4%
856
 
3.2%
935
 
2.0%
1044
 
2.5%
ValueCountFrequency (%)
2171
0.1%
2001
0.1%
1931
0.1%
1841
0.1%
1821
0.1%
1751
0.1%
1721
0.1%
1611
0.1%
1552
0.1%
1531
0.1%

pollutant_max
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct340
Distinct (%)19.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean96.87341772
Minimum1
Maximum500
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size13.7 KiB
2021-10-31T16:54:06.838096image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile4
Q121
median63
Q3124
95-th percentile335
Maximum500
Range499
Interquartile range (IQR)103

Descriptive statistics

Standard deviation104.7650939
Coefficient of variation (CV)1.081463794
Kurtosis2.488366121
Mean96.87341772
Median Absolute Deviation (MAD)48
Skewness1.68591993
Sum168366
Variance10975.7249
MonotonicityNot monotonic
2021-10-31T16:54:07.151119image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
647
 
2.7%
832
 
1.8%
329
 
1.7%
229
 
1.7%
1028
 
1.6%
926
 
1.5%
725
 
1.4%
525
 
1.4%
425
 
1.4%
1123
 
1.3%
Other values (330)1449
83.4%
ValueCountFrequency (%)
113
 
0.7%
229
1.7%
329
1.7%
425
1.4%
525
1.4%
647
2.7%
725
1.4%
832
1.8%
926
1.5%
1028
1.6%
ValueCountFrequency (%)
5008
0.5%
4951
 
0.1%
4891
 
0.1%
4871
 
0.1%
4771
 
0.1%
4741
 
0.1%
4731
 
0.1%
4701
 
0.1%
4691
 
0.1%
4671
 
0.1%

pollutant_avg
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct237
Distinct (%)13.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean54.10069045
Minimum1
Maximum314
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size13.7 KiB
2021-10-31T16:54:07.376137image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3
Q112
median31
Q370
95-th percentile194.15
Maximum314
Range313
Interquartile range (IQR)58

Descriptive statistics

Standard deviation60.82415825
Coefficient of variation (CV)1.124276931
Kurtosis2.54020862
Mean54.10069045
Median Absolute Deviation (MAD)23
Skewness1.72497588
Sum94027
Variance3699.578226
MonotonicityNot monotonic
2021-10-31T16:54:07.602154image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
552
 
3.0%
450
 
2.9%
246
 
2.6%
646
 
2.6%
1240
 
2.3%
339
 
2.2%
738
 
2.2%
1035
 
2.0%
834
 
2.0%
130
 
1.7%
Other values (227)1328
76.4%
ValueCountFrequency (%)
130
1.7%
246
2.6%
339
2.2%
450
2.9%
552
3.0%
646
2.6%
738
2.2%
834
2.0%
930
1.7%
1035
2.0%
ValueCountFrequency (%)
3141
0.1%
3091
0.1%
3081
0.1%
2971
0.1%
2931
0.1%
2901
0.1%
2891
0.1%
2841
0.1%
2821
0.1%
2802
0.1%

pollutant_id_CO
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size98.6 KiB
0
1469 
1
269 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
01469
84.5%
1269
 
15.5%

Length

2021-10-31T16:54:07.799167image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-31T16:54:07.910177image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
01469
84.5%
1269
 
15.5%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

pollutant_id_NH3
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size98.6 KiB
0
1520 
1
218 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row1
5th row0

Common Values

ValueCountFrequency (%)
01520
87.5%
1218
 
12.5%

Length

2021-10-31T16:54:08.253204image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-31T16:54:08.357208image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
01520
87.5%
1218
 
12.5%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

pollutant_id_NO2
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size98.6 KiB
0
1487 
1
251 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row1
4th row0
5th row0

Common Values

ValueCountFrequency (%)
01487
85.6%
1251
 
14.4%

Length

2021-10-31T16:54:08.470218image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-31T16:54:08.579229image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
01487
85.6%
1251
 
14.4%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

pollutant_id_OZONE
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size98.6 KiB
0
1481 
1
257 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
01481
85.2%
1257
 
14.8%

Length

2021-10-31T16:54:08.704235image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-31T16:54:08.817247image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
01481
85.2%
1257
 
14.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

pollutant_id_PM10
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size98.6 KiB
0
1493 
1
245 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
01493
85.9%
1245
 
14.1%

Length

2021-10-31T16:54:08.935252image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-31T16:54:09.068265image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
01493
85.9%
1245
 
14.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

pollutant_id_PM2.5
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size98.6 KiB
0
1486 
1
252 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
01486
85.5%
1252
 
14.5%

Length

2021-10-31T16:54:09.217276image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-31T16:54:09.357286image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
01486
85.5%
1252
 
14.5%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

pollutant_id_SO2
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size98.6 KiB
0
1492 
1
246 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row1

Common Values

ValueCountFrequency (%)
01492
85.8%
1246
 
14.2%

Length

2021-10-31T16:54:09.482295image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-31T16:54:09.603304image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
ValueCountFrequency (%)
01492
85.8%
1246
 
14.2%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Interactions

2021-10-31T16:54:00.812130image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:48.848229image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:50.523354image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:52.448500image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:53.971614image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:55.676742image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:57.488880image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:59.229008image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:54:01.012144image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:49.058243image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:50.725371image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:52.642514image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:54.163628image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:55.920763image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:57.909911image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:59.443026image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:54:01.232162image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:49.276262image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:50.958387image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:52.826527image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:54.391646image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:56.154779image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:58.099927image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:59.645041image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:54:01.444177image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:49.489275image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:51.168403image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:53.006543image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:54.593661image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:56.365795image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:58.275939image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:59.825056image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:54:01.643193image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:49.720294image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:51.564433image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:53.229558image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:54.803677image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:56.604814image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:58.466953image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:54:00.032072image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:54:01.863207image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:49.953311image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:51.790451image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:53.428575image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:54.991691image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:56.914835image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:58.642967image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:54:00.260088image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:54:02.076225image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:50.136325image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:52.022469image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:53.596587image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:55.181705image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:57.108850image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:58.821980image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:54:00.435102image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:54:02.267239image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:50.329340image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:52.250484image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:53.778600image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:55.375723image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:57.293862image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:53:59.047996image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
2021-10-31T16:54:00.621114image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Correlations

2021-10-31T16:54:09.770317image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-10-31T16:54:10.160347image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-10-31T16:54:10.599378image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-10-31T16:54:10.959405image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2021-10-31T16:54:11.242426image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-10-31T16:54:02.709270image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-10-31T16:54:03.209309image/svg+xmlMatplotlib v3.4.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexidcountrystatecitystationpollutant_minpollutant_maxpollutant_avgpollutant_id_COpollutant_id_NH3pollutant_id_NO2pollutant_id_OZONEpollutant_id_PM10pollutant_id_PM2.5pollutant_id_SO2
00100621569.0109.086.00000010
11200621582.0138.0105.00000100
22300621510.042.019.00010000
3340062154.05.04.00100000
44500621516.042.027.00000001
55600621515.045.032.01000000
6670062154.082.042.00001000
77800113347.0111.071.00000010
88900113349.0120.086.00000100
991000113311.044.023.00010000

Last rows

df_indexidcountrystatecitystationpollutant_minpollutant_maxpollutant_avgpollutant_id_COpollutant_id_NH3pollutant_id_NO2pollutant_id_OZONEpollutant_id_PM10pollutant_id_PM2.5pollutant_id_SO2
172818261827025771962.015.04.00000001
1729182718280257719631.085.039.01000000
173018281829025771966.076.031.00001000
1731182918300257726928.075.054.00000010
1732183018310257726936.0101.074.00000100
1733183118320257726910.022.015.00010000
173418321833025772691.03.02.00100000
173518331834025772696.028.010.00000001
1736183418350257726934.092.041.01000000
1737183518360257726910.0116.043.00001000